Skip to content

Conversation

@cyphar
Copy link
Member

@cyphar cyphar commented Nov 7, 2025

This was always the intended behaviour but commit 72fbb34 ("rootfs:
switch to fd-based handling of mountpoint targets") regressed it when
adding a mechanism to create a file handle to the target if it didn't
already exist (causing the later stat to always succeed).

A lot of people depend on this functionality, so add some tests to make
sure we don't break it in the future.

Fixes #4971
Fixes: 72fbb34 ("rootfs: switch to fd-based handling of mountpoint targets")
Signed-off-by: Aleksa Sarai [email protected]

@cyphar cyphar added backport/1.2-todo A PR in main branch which needs to be backported to release-1.2 backport/1.3-todo A PR in main branch which needs to be backported to release-1.3 backport/1.4-todo A PR in main branch which needs to backported to release-1.4 labels Nov 7, 2025
@cyphar cyphar force-pushed the tmpfs-mode branch 3 times, most recently from 0a6577f to 8d0921c Compare November 7, 2025 04:31
@cyphar
Copy link
Member Author

cyphar commented Nov 8, 2025

This patch seems to trigger errors from criu-dev more consistently than the flakes we've been seeing in the past few weeks... Odd...

@lifubang
Copy link
Member

lifubang commented Nov 8, 2025

This patch seems to trigger errors from criu-dev more consistently than the flakes we've been seeing in the past few weeks... Odd...

In fact, I suggest to remove the criu-dev job in CI(#4983).

@cyphar
Copy link
Member Author

cyphar commented Nov 8, 2025

I saw that, but the fact it fails more consistently with this change seems to imply it's not the same flake. Normally retrying the job succeeds after one (or maybe two) attempts, but this PR has failed >10 times in a row with multiple tests failing.

Also note that criu-dev will eventually be released and we will have to deal with the same issue in the future, so I don't really see the benefit of just removing the CI job. @kolyshkin has spent some time debugging the issue in checkpoint-restore/criu#2781 and it appears that we found an issue in CRIU thanks to this testing.

@lifubang
Copy link
Member

lifubang commented Nov 8, 2025

I think we already have a CI job that tests the latest stable version of CRIU, so testing the project in development mode doesn’t really make sense—at least in my opinion. I could be wrong, though!

@rst0git
Copy link
Contributor

rst0git commented Nov 8, 2025

@cyphar I'm not sure if this is the same error we saw earlier in #4952

so testing the project in development mode doesn’t really make sense—at least in my opinion

If it makes more sense, we can extend the CI for CRIU to run the runc integration tests.

@cyphar
Copy link
Member Author

cyphar commented Nov 8, 2025

@rst0git The error I saw yesterday was different and it was CRIU related, there is a separate flake I need to work around. I'll get you a copy of the logs on Monday.

@lifubang
Copy link
Member

lifubang commented Nov 8, 2025

not ok 32 checkpoint and restore
# (from function `simple_cr' in file tests/integration/checkpoint.bats, line 130,
#  in test file tests/integration/checkpoint.bats, line 184)
#   `simple_cr' failed
# runc spec (status=0)
#
# runc run -d --console-socket /tmp/bats-run-qpDOrR/runc.zXBAql/tty/sock test_busybox (status=0)
#
# runc state test_busybox (status=0)
# {
#   "ociVersion": "1.2.1+dev",
#   "id": "test_busybox",
#   "pid": 13959,
#   "status": "running",
#   "bundle": "/tmp/bats-run-qpDOrR/runc.zXBAql/bundle",
#   "rootfs": "/tmp/bats-run-qpDOrR/runc.zXBAql/bundle/rootfs",
#   "created": "2025-11-08T01:27:03.914295576Z",
#   "owner": ""
# }
# runc checkpoint --work-path ./work-dir test_busybox (status=1)
# time="2025-11-08T01:27:04Z" level=warning msg="open work-dir/dump.log: no such file or directory"
# time="2025-11-08T01:27:04Z" level=error msg="read unixpacket @->@: EOF"
# --- teardown ---

logs_49433585743.zip

@cyphar
Copy link
Member Author

cyphar commented Nov 8, 2025

And of course now it passes when I ran it again...

This was always the intended behaviour but commit 72fbb34 ("rootfs:
switch to fd-based handling of mountpoint targets") regressed it when
adding a mechanism to create a file handle to the target if it didn't
already exist (causing the later stat to always succeed).

A lot of people depend on this functionality, so add some tests to make
sure we don't break it in the future.

Fixes: 72fbb34 ("rootfs: switch to fd-based handling of mountpoint targets")
Signed-off-by: Aleksa Sarai <[email protected]>
@cyphar
Copy link
Member Author

cyphar commented Nov 10, 2025

This is ready to review by the way, @opencontainers/runc-maintainers.

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cyphar cyphar merged commit eec1f7e into opencontainers:main Nov 10, 2025
54 of 55 checks passed
@cyphar cyphar mentioned this pull request Nov 10, 2025
13 tasks
@cyphar cyphar deleted the tmpfs-mode branch November 10, 2025 16:17
@kolyshkin kolyshkin added backport/1.2-done A PR in main branch which has been backported to release-1.2 backport/1.3-done A PR in main branch which has been backported to release-1.3 backport/1.4-done A PR in main branch which has been backported to release-1.4 and removed backport/1.2-todo A PR in main branch which needs to be backported to release-1.2 backport/1.3-todo A PR in main branch which needs to be backported to release-1.3 backport/1.4-todo A PR in main branch which needs to backported to release-1.4 labels Nov 10, 2025
edmorley added a commit to heroku/heroku-buildpack-python that referenced this pull request Nov 14, 2025
The Python classic repo's CI just started failing in the
container-test job with:
`mkdir: cannot create directory '/app/.heroku': Permission denied`

eg:
https://github.com/heroku/heroku-buildpack-python/actions/runs/19368179568/job/55418539741

After updating Docker locally, I was able to reproduce the error, and
have found it's due to the recent runc 1.33 release:
https://github.com/opencontainers/runc/releases/tag/v1.3.3

This runc release includes a number of security fixes - however, one of
which has a regression:
opencontainers/runc#4971

There is a fix for this upstream:
opencontainers/runc#4973

...but it's not released yet.

However, we can work around the issue by explicitly setting the previous
tmpfs permissions using `:mode=1777`:
https://docs.docker.com/engine/storage/tmpfs/#options-for---tmpfs

GUS-W-20221627.
edmorley added a commit to heroku/heroku-buildpack-python that referenced this pull request Nov 14, 2025
The Python classic repo's CI just started failing in the
container-test job with:
`mkdir: cannot create directory '/app/.heroku': Permission denied`

eg:
https://github.com/heroku/heroku-buildpack-python/actions/runs/19368179568/job/55418539741

After updating Docker locally, I was able to reproduce the error, and
have found it's due to the recent runc 1.3.3 release:
https://github.com/opencontainers/runc/releases/tag/v1.3.3

This runc release includes a number of security fixes - however, one of
which has a regression:
opencontainers/runc#4971

There is a fix for this upstream:
opencontainers/runc#4973

...but it's not released yet.

However, we can work around the issue by explicitly setting the previous
tmpfs permissions using `:mode=1777`:
https://docs.docker.com/engine/storage/tmpfs/#options-for---tmpfs

GUS-W-20221627.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport/1.2-done A PR in main branch which has been backported to release-1.2 backport/1.3-done A PR in main branch which has been backported to release-1.3 backport/1.4-done A PR in main branch which has been backported to release-1.4

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Recent security update changed default permissions of tmpfs

5 participants